Searching of real-time internet content responsive to a structured search query generated based on user-specified search terms/phrases and private database records matching initial user-selected constraints

ABSTRACT

Systems and methods for an improved Internet search engine are provided. According to one embodiment, a search request includes (i) one or more initial constraints to constrain a search of real-time Internet content to a domain name space and (ii) one or more user-specified search terms or phrases. A subset of potential outcomes is provided as feedback to a user by generating a search query to be applied to the content within the domain name space by: (i) identifying company records matching the initial constraint(s) and satisfying one or more sampling criteria by searching a company database; and (ii) incorporating into the search query: (a) domain names of matching company records; and (b) the user-specified search terms or phrases. The subset of potential outcomes are then displayed to the user by performing a limited search of the real-time Internet content based on the structured search query.

CROSS-REFERENCE TO RELATED PATENTS

This application is a continuation of U.S. patent application Ser. No.16/373,125, filed on Apr. 2, 2019, which is hereby incorporated byreference in its entirety for all purposes.

COPYRIGHT NOTICE

Contained herein is material that is subject to copyright protection.The copyright owner has no objection to the facsimile reproduction ofthe patent disclosure by any person as it appears in the Patent andTrademark Office patent files or records, but otherwise reserves allrights to the copyright whatsoever. Copyright © 2019-2021, OnemataCorporation.

BACKGROUND Field

Embodiments of the present invention generally relate to Internet searchtechnologies. In particular, embodiments of the present invention relateto retrieval of real-time Internet content responsive to a structuredsearch query generated based on private database records matching one ormore initial user-selected constraints (e.g., characteristics orattributes of companies of interest) and one or more user-specifiedsearch terms/phrases and enabling local filtering and ranking of theretrieved real-time results based on refinements to the user-specifiedsearch terms/phrases and user-specified scoring of the searchterms/phrases, respectively.

Description of the Related Art

The Internet is defined as the worldwide interconnection of individualnetworks operated by government, industry, academia, and privateparties. As of December 2018, there were approximately 1.94 billionwebsites in the world. The sheer number of websites and the constantevolution and changes to these websites over time makes it impracticalto provide reasonably prompt search results of real-time (i.e., live)Internet content.

A survey by Netcraft in 2012 showed that the total number of websiteslaunched in 2012 were 51 million—representing an average ofapproximately 140,000 websites being launched per day. Because real-timeInternet content is being added at such a significant pace, searchresults provided by traditional Internet search engines (e.g., Google,Bing, Yahoo, Ask.com and the like), which rely on periodic crawling todiscover new and updated pages to be added to their local indices, arenecessarily stale as they have traded off the speed of providing searchresults against the freshness and accuracy of the results. For example,it may take four weeks or longer for a new website to be indexed byGoogle and stored in its local database.

Other limitations of traditional Internet search engines include searchbias and the unstructured nature of their free-form search queries,which are restricted to text entered into a search text field. Withrespect to search bias, traditional search engines rank websites atleast in part based on the concept of intrinsic authority (i.e., awebsite's purported relevance to a specific subject area or industrybased on automated analytic algorithms), which in many cases is flawedand allows rankings to be manipulated by keyword stuffing, for example.Furthermore, rankings of search results are affected by the searchengine companies' commercial interests, including their dependence onpaid advertising, keyword bidding, and the promotion of their ownproducts. Turning now to the unstructured textual input provided totraditional search engines, this denies the search engine context thatmight be helpful to provide meaningful and relevant search results tothe end user.

In view of the foregoing, there is a need in the art for improvedInternet search technology that, among other things, provides resultsthat are fresh at the time the search is performed, constrains searchprocessing of real-time Internet content to enable reasonable searchtimes by injecting context into the search process by way of structuredsearch queries, and allows interactive filtering and ranking of thesearch results by end users, thereby providing end users with morerelevant and actionable results.

SUMMARY

Systems and methods are described for an improved Internet searchengine. According to one embodiment, a search request is received froman end user of a subscriber of a SaaS based Internet search engine. Thesearch request includes (i) one or more initial constraints specified bythe end user that are to be used by the SaaS based Internet searchengine to constrain a search of real-time Internet content to beperformed based on the search request to a domain name space and (ii)one or more user-specified search terms or phrases. Responsive toreceipt of the search request, the end user is provided with feedbackregarding a subset of potential outcomes of the search request bygenerating a structured search query containing a finite number ofdomain names defining the domain name space and one or more search termsor phrases to be applied to the real-time Internet content within thedomain name space by: (i) identifying company records matching the oneor more initial constraints and satisfying one or more sampling criteriaby searching a company database having stored thereincharacteristics/attributes regarding multiple companies; and (ii)incorporating into the structured search query: (a) multiple domainnames each representing a website domain of a company of those of themultiple companies associated with the identified matching records byextracting the multiple domain names from the identified matchingcompany records; and (b) the one or more user-specified search terms orphrases. The subset of potential outcomes are then displayed to the enduser by performing a limited search of the real-time Internet contentbased on the structured search query.

Other features of embodiments of the present disclosure will be apparentfrom accompanying drawings and detailed description that follows.

BRIEF DESCRIPTION OF THE DRAWINGS

In the Figures, similar components and/or features may have the samereference label. Further, various components of the same type may bedistinguished by following the reference label with a second label thatdistinguishes among the similar components. If only the first referencelabel is used in the specification, the description is applicable to anyone of the similar components having the same first reference labelirrespective of the second reference label.

FIG. 1 is a context level diagram illustrating external interactionswith/by a SaaS based Internet search engine in accordance with anembodiment of the present invention.

FIG. 2 is a system level diagram conceptually illustrating anarchitecture of a SaaS based Internet search engine in accordance withan embodiment of the present invention.

FIG. 3 is a high-level flow diagram illustrating search processing inaccordance with an embodiment of the present disclosure.

FIG. 4 is a flow diagram illustrating structured search query generationprocessing in accordance with an embodiment of the present disclosure.

FIG. 5 is a flow diagram illustrating real-time Internet contentacquisition processing in accordance with an embodiment of the presentdisclosure.

FIG. 6 is a flow diagram illustrating web page retrieval processing inaccordance with an embodiment of the present disclosure.

FIG. 7 illustrates an exemplary architecture of an Internet searchengine in accordance with an embodiment of the present invention.

FIG. 8A illustrates a portion of an Ideal Customer Profile (ICP) screenof a user interface of a SaaS based Internet search engine in accordancewith an embodiment of the present invention.

FIG. 8B illustrates a portion of the Ideal Customer Profile (ICP) screenafter the end user has specified a mandatory term/phrase in accordancewith an embodiment of the present invention.

FIG. 8C illustrates a portion of the Ideal Customer Profile (ICP) screenafter the end user has specified multiple ideal terms/phrases inaccordance with an embodiment of the present invention.

FIG. 8D illustrates a portion of the Ideal Customer Profile (ICP) screenafter the end user has specified multiple exclusionary terms/phrases inaccordance with an embodiment of the present invention.

FIG. 8E illustrates a portion of prospect list screen of a userinterface of a SaaS based Internet search engine in accordance with anembodiment of the present invention.

FIG. 8F illustrates a portion of search result filtering screen of auser interface of a SaaS based Internet search engine in accordance withan embodiment of the present invention.

FIG. 8G illustrates a portion of search result ranking screen of a userinterface of a SaaS based Internet search engine in accordance with anembodiment of the present invention.

FIG. 8H illustrates a portion of search result ranking screen of a userinterface of a SaaS based Internet search engine in accordance with analternative embodiment of the present invention.

FIG. 9 illustrates an exemplary computer system in which or with whichembodiments of the present invention may be utilized.

DETAILED DESCRIPTION

Systems and methods are described for an improved Internet searchengine. In the following description, numerous specific details are setforth in order to provide a thorough understanding of embodiments of thepresent invention. It will be apparent, however, to one skilled in theart that embodiments of the present invention may be practiced withoutsome of these specific details. In other instances, well-knownstructures and devices are shown in block diagram form.

Embodiments of the present invention include various steps, which willbe described below. The steps may be performed by hardware components ormay be embodied in machine-executable instructions, which may be used tocause a general-purpose or special-purpose processor programmed with theinstructions to perform the steps. Alternatively, depending upon theparticular implementation, various steps may be performed by acombination of hardware, software, firmware and/or by human operators.

Embodiments of the present invention may be provided as a computerprogram product, which may include a non-transitory machine-readablestorage medium embodying thereon instructions, which may be used toprogram a computer (or other electronic devices) to perform a process.The machine-readable medium may include, but is not limited to, fixed(hard) drives, magnetic tape, floppy diskettes, optical disks, compactdisc read-only memories (CD-ROMs), and magneto-optical disks,semiconductor memories, such as ROMs, PROMs, random access memories(RAMs), programmable read-only memories (PROMs), erasable PROMs(EPROMs), electrically erasable PROMs (EEPROMs), flash memory, magneticor optical cards, or other type of media/machine-readable mediumsuitable for storing electronic instructions (e.g., computer programmingcode, such as software or firmware).

Various methods described herein may be practiced by combining one ormore non-transitory machine-readable storage media containing the codeaccording to embodiments of the present invention with appropriatespecial purpose or standard computer hardware to execute the codecontained therein. An apparatus for practicing various embodiments ofthe present invention may involve one or more computers (e.g., physicaland/or virtual servers) (or one or more processors within a singlecomputer) and storage systems containing or having network access tocomputer program(s) coded in accordance with various methods describedherein, and the method steps associated with embodiments of the presentinvention may be accomplished by modules, routines, subroutines, orsubparts of a computer program product.

While embodiments of the present invention are described herein withreference to a particular practical application involving performingsearching of real-time Internet content to identify motivated prospectsbased on an Ideal Customer Profile (ICP) defined by an end user,embodiments of the present invention are applicable to searching ofreal-time and/or archived Internet content more generally. An end useror subscriber may wish to identify companies or products/servicesoffered by such companies for purposes other than or in addition tocustomer prospecting, including, for example, for purposes offacilitating one or more of industry research, governmentadministration, identifying potential merger and/or acquisitionopportunities, recruiting/staffing, procurement, licensing, partnering,joint development, joint research, collaboration, and/or otherwiseobtaining actionable business intelligence.

Terminology

Brief definitions of terms used throughout this application are givenbelow.

The phrase “cloud service” refers to any service made available to userson demand via the Internet from physical or virtual servers of a cloudcomputing service provider as opposed to being provided from on-premisesservers of an enterprise. Cloud services are designed to provide easy,scalable access to virtual hardware (e.g., Linux virtual machines,Windows virtual machines, blob storage, file storage, managed disks),software, infrastructure, applications (e.g., Single Sign-On, databases,developer tools) and other services and resources, and are fully managedby the cloud services provider. Non-limiting examples of cloud serviceproviders and their respective cloud computing platforms include Amazon(Amazon Web Services), Kamatera (Kamatera cloud computinginfrastructure), Microsoft Corporation (Microsoft Azure), Google (GoogleCloud Platform), VMware (VMware Cloud), IBM (IBM cloud), Oracle (OracleCloud), Red Hat (Red Hat Cloud) and Rackspace (Rackspace Cloud).

The term “subscriber” generally refers to an entity or person thatsubscribes to Internet search services provided by an Internet searchengine or Internet search provider.

The term “user” and the phrase “end user” generally refer to a personassociated with a subscriber and having a user account with the Internetsearch engine or Internet search provider. Notably, a subscriber and anend user may effectively be one in the same for a subscribing entityhaving a single user. As such, “subscriber,” “user,” and “end user” maybe used interchangeably herein. In other cases, depending upon thecontext, “subscriber” may refer to an entity having multiple end users.

A “computer” or “computer system” may be one or more physical computers,virtual computers, or computing devices. As an example, a computer maybe one or more server computers, cloud-based computers, cloud-basedcluster of computers, virtual machine instances or virtual machinecomputing elements such as virtual processors, storage and memory, datacenters, storage devices, desktop computers, laptop computers, mobiledevices, or any other special-purpose computing devices. Any referenceto “a computer” or “a computer system” herein may mean one or morecomputers, unless expressly stated otherwise.

The terms “connected” or “coupled” and related terms are used in anoperational sense and are not necessarily limited to a direct connectionor coupling. Thus, for example, two devices may be coupled directly, orvia one or more intermediary media or devices. As another example,devices may be coupled in such a way that information can be passedthere between, while not sharing any physical connection with oneanother. Based on the disclosure provided herein, one of ordinary skillin the art will appreciate a variety of ways in which connection orcoupling exists in accordance with the aforementioned definition.

If the specification states a component or feature “may”, “can”,“could”, or “might” be included or have a characteristic, thatparticular component or feature is not required to be included or havethe characteristic.

As used in the description herein and throughout the claims that follow,the meaning of “a,” “an,” and “the” includes plural reference unless thecontext clearly dictates otherwise. Also, as used in the descriptionherein, the meaning of “in” includes “in” and “on” unless the contextclearly dictates otherwise.

The phrases “in an embodiment,” “according to one embodiment,” and thelike generally mean the particular feature, structure, or characteristicfollowing the phrase is included in at least one embodiment of thepresent disclosure, and may be included in more than one embodiment ofthe present disclosure. Importantly, such phrases do not necessarilyrefer to the same embodiment.

FIG. 1 is a context level diagram illustrating external interactionswith/by a SaaS based Internet search engine 100 in accordance with anembodiment of the present invention. As described further below, in oneembodiment, Internet search engine 100 may be implemented in the form ofa powerful customer prospecting tool that facilitates the identificationof motivated prospective customers by subscribers 110 a-n. In thecontext of the present example, Internet search engine 100 may operatein accordance with a SaaS software distribution model in which athird-party (an Internet search service provider or SaaS provider) hostsan application and associated data that facilitates searching ofreal-time Internet content by subscribers 110 a-n. Subscribers 110 a-nmay represent corporate entities including a number of end users 111 a-nthat interact with Internet search engine 100 via the Internet (notshown) or may represent individual end users. As explained below,Internet search engine 100 may make use of data contained in one or moreexternal private or public data sources 130 a-n to constrain the domainname space of real-time Internet content searched responsive to searchqueries issued by the end users of subscribers 110 a-n so as to makepractical the delivery of both fresh and accurate search results in areasonable amount of time.

The private data sources may include, but are not limited to, databasesof company information (company databases), potentially a separatedatabase for each language supported by Internet search engine 100,proprietary databases (e.g., a subscriber's list of companies orprospects and/or Customer Relationship Management (CRM) data, socialmedia data, commercially available customer prospecting databases, andthe like), and government or compiled data. Internet search engine 100may retrieve data from external private/public data sources 130 a-n byissuing queries via web services and/or web application programminginterfaces (APIs) provided by the owners/hosts of such data sources. Asone non-limiting example, the web APIs may be implemented in the form ofRepresentational State Transfer (REST)ful APIs.

FIG. 2 is a system level diagram conceptually illustrating anarchitecture of a SaaS based Internet search engine 200 in accordancewith an embodiment of the present invention. In the present example,subscribers (e.g., subscribes 110 a-n) are provided with Internetcontent search services by Internet search engine 200 via the Internet201. According to the intentionally simplified and conceptualillustration provided by FIG. 2, Internet search engine 200 includes oneor more web servers 210, one or more search servers 215, one or moreapplication servers 220 and data storage 230. As embodiments of thepresent disclosure are focused primarily on user interface enhancementsand other methodologies (e.g., query generation and Internet contentprocessing) that seek to improve Internet searching to facilitatepractical searching of real-time Internet content, for sake of brevityan exemplary architecture that may be utilized by the SaaS provider forhosting the application and data centrally is provided simply forcontext and is described at a high level.

Web server(s) 210 provide HyperText Transfer Protocol (HTTP) protocollevel service and serve as an interface between an end user's browserand Internet search engine 200. As those skilled in the art appreciate,web servers work with HTTP requests, responding and handling them inorder to carry out website functions, as well as hosting the website,storing its static content, such as images, JavaScript, Cascading StyleSheet (CSS), and HyperText Markup Language (HTML) pages.

Application servers 220 provide web server support and handle allapplication operations between end users and backend applications ordatabases (e.g., search results database 231, subscriber database 233,and internal databases 235 a-n) within data storage 230. For example,application servers 220 may be responsible for generating dynamiccontent for presentation to end users via their browsers. Applicationservers may also handle other functions, such as user authentication inaccordance with policies established by the subscribers and the Internetsearch service provider. For purposes of supporting scalability andmultitenancy, while a common version of the application providing forsearching of real-time Internet content may be used by all subscribers,the application may be installed on multiple machines or machineinstances.

Search servers 215 are responsible for performing Internet searchprocessing as described in further detail below, including, but notlimited to, executing structured search queries generated by applicationservers 220 responsive to search requests initiated by end users.

In the current example, data storage 230 is shown including a searchresults database 231, a subscriber database 233, and multiple internaldatabases 235 a-n. Search results database 231 may provide storage forsearch results returned by search servers 215 in response to searchrequests made by end users. For example, search results database 231 maystore search results (e.g., a list of companies and associated Internetcontent) and various sub-datasets of search results provided by searchservers 215 on behalf of subscribers. Subscriber database 233 may storesubscriber-specific data, including, among other things, user names ofend users and corresponding login credentials and access rights, foreach subscriber. Internal databases 235 a-n may represent local cachedversions or subsets of various external private/public data sources(e.g., external private/public data source 130 a-n). One or more ofinternal databases 235 a-n may also represent a private company databasegenerated by SaaS provider based on one or more of purchased data and/ormined Internet data. Those skilled in the art will appreciate a greateror lesser number of databases may be used depending upon the particularimplementation and desired distribution/isolation of data.

According to one embodiment, the architecture depicted in FIG. 2 may beimplemented entirely within a local or on-premises server farmcontrolled by the SaaS provider or entirely within a third-party cloudservice. Alternatively, a hybrid architecture may be employed in whichone or more portions of the architecture depicted in FIG. 2 aredistributed among the local or on-premises server farm and one or morethird-party cloud services. In such a hybrid architecture, requests fromsubscribers can be handled locally until a predetermined or configurableresource usage threshold is reached with overflow in excess of theresource usage threshold being offloaded to the one or more cloudservices.

FIG. 3 is a high-level flow diagram illustrating search processing inaccordance with an embodiment of the present disclosure. In the contextof the present example, search processing is initiated responsive toreceipt of a search request at block 310. As described further below,the search request may be derived from input provided by an end user ofan Internet search engine (e.g., Internet search engine 100) via abrowser-based user interface. In one embodiment, the search requestspecifies one or more initial constraints and one or more search termsand/or phrases. Among other things, the one or more initial constraintslimit the scope of Internet content that is to be searched, therebymaking the performance of a search of real-time Internet contentfeasible within a reasonable amount of time. The one or more searchterms and/or phrases may include individual terms or multiple terms(phrases) that must be present, must not be present and/or arepreferably present within Internet content in order for the Internetcontent to be considered a match in the context of the search query.

At block 320, a structured search query is generated. The structuredsearch query may represent a query that is to be performed in whole orin part by one or more downstream processing resources (e.g., one ormore physical or virtual servers specifically allocated to meet theprocessing requirements associated with the structured search query). Inone embodiment, the structured search query includes a list of domainnames of websites that are to be searched for the one or more searchterms and/or phrases and is generated based on the search requestreceived at block 310 and one or more private data sources. As describedfurther below, since even the performance of a constrained real-timeInternet content search is a time consuming process that may take from afew hours to several hours depending upon various factors includingcomputing resources allocated to or otherwise applied to the search, thenumber of websites and the number of webpages within the websites beingscanned, in one embodiment, a configurable or predetermined parameter(e.g., selectable by the end user via the browser-based interface or adefault parameter for all new searches) may intentionally limit astructured search query generated responsive to an initial searchrequest (or until the end user specifies otherwise) to a relativelysmall sample set of websites meeting a sampling criterion (e.g., thoseassociated with 100, 200 or 500 domain names, those associated withcompanies in a particular geographic area (state, city, or postal code(e.g., a ZIP code) or those associated with companies within apredetermined or configurable radius of a particular location (e.g.,address, postal code, or coordinates). In this manner, the end user canbe provided with prompt feedback (on the order of a few to severalminutes) regarding a subset of potential outcomes of their searchrequest and be provided with an opportunity to revise it accordinglyrather than submitting a search request that is potentially overlyinclusive and having to wait several hours to find out the searchrequest is not consistent with the end user's intent. A non-limitingexample of structured search query generation processing is describedfurther below with reference to FIG. 4.

At block 330, real-time Internet content acquisition is performed basedon and responsive to the structured search query generated at block 320.As those skilled in the art will appreciate, there are a number of waysto distribute the workload associated with performing real-time Internetacquisition. For example, the unit of work to be allocated to availableprocessing resources may be at the domain, sub-domain or web page level.A non-limiting example of real-time Internet content acquisition isdescribed further below with reference to FIG. 5.

At block 340, the search results produced by block 330 are presented tothe end user and stored so as to allow further local manipulation (e.g.,filtering, ranking, exporting and the like). In one embodiment, a listof companies having website content meeting the search query is causedto be displayed to the end user via a browser-based interface. Asdescribed further below, the list of companies may be presented in aninteractive and hierarchical form in which various search statisticsand/or various details regarding each company are available and can berevealed and/or hidden responsive to the end user selecting an interfaceelement associated with the company of interest to expand the hierarchyto reveal the underlying layer of information for the particular companyor collapse the hierarchy to hide the underlying layer of informationfor the particular company.

Depending upon the particular implementation, various operations may beperformed on the search results. In the context of the present example,the operations include performing a new search, revising the existingsearch, filtering the search results, exporting the search results, andranking the search results. To the extent the current search resultswere generated to provide the end user with prompt feedback for alimited sample set, the end user can also be provided with the option ofconfirming his/her desire to continue with a full search based on thecurrent search request. At decision block 350, responsive to receipt ofa selection of a search result operation from the end user via thebrowser-based interface, a determination is made regarding whichoperation has been requested by the end user. If the selected operationis to perform a new search or revise the existing search, processingloops back to block 310. If the selected operation is to performfiltering on the search results, then processing continues with block360. If the selected operation is to export the current search results,then processing continues with block 370. If the selected operation isto perform ranking on the search results, then processing continues withblock 380. If the selected operation is to perform a full search (e.g.,confirming the end user is satisfied the limited sample set of searchresults is consistent with his/her intent), then processing loops backto block 320 where the constraints on the number of domain namesincluded within the structured search query and/or geographicallimitations (e.g., ZIP code) are removed and the full real-time searchis launched.

At block 360, filtering parameter(s) are applied and the displayedsearch results are refreshed. In one embodiment, one or moreuser-selected filtering parameters are received from the end user viathe browser-based interface and applied to the current set of searchresults and the filtered set of search results are presented to the enduser via the browser-based interface. The filtering parameters(s) mayinclude refining/editing the originally provided one or more searchterms and/or phrases and/or by including/excluding various additionalcompany characteristics or attributes. As described further below, withreference to FIG. 8F, search results may be over inclusive as a resultof user-specified mandatory or ideal search terms and phrases being usedon companies' websites in a different context than anticipated by theend user, for example. In such a scenario, the end user may specify oneor more company characteristics or attributes, for example, forexclusionary filtering to cause undesired companies to be excluded fromfuture search results. After the filtering parameter(s) are applied andthe search results are redisplayed, processing loops back to decisionblock 350 where the end user can continue to refine the results asdesired.

At block 380, ranking parameter(s) are applied and the displayed searchresults are refreshed. In one embodiment, the end user can control theorder in which the search results are displayed by weighting the searchterms and/or phrases associated with the search request. As describedfurther below, with reference to FIG. 8H, the relative importance ofideal search terms/phrases can be defined by the end user by assigningscores (e.g., from 1 to 10) for each occurrence of a particular searchterm and/or phrase within the content of the websites. These scores canthen be aggregated and the companies within the search results can beranked based on their relative aggregated scores.

At block 370, the current set of search results as filtered and/orranked in accordance with the end user's specified filtering parametersand ranking parameter(s) are exported to a user-selected modality. Forexample, the end user may export the current set of search results inPDF format, to an Excel file or a Word file. Alternatively oradditionally, the end user may store the current search results as oneof a variety of sub-data sets for future reference. In some embodiments,the Internet search engine may perform a periodic background process(e.g., daily, every other day, weekly, or the like) to refresh and/orupdate one or more of the stored search results based on the associatedstructured search query and any user-specified filtering and rankingparameters. In this manner, during subsequent sessions with the Internetsearch engine, the end user will have fresh search results for storedsearch sets.

FIG. 4 is a flow diagram illustrating structured search query generationprocessing in accordance with an embodiment of the present disclosure.As noted above in the Background, performance of search processing ofreal-time Internet content associated with potentially multiple billionsof websites is currently impractical within reasonable search times.While existing Internet search engines assume end users are willing totradeoff accuracy and freshness of search results in exchange foridentification and delivery of potentially millions of search results inunder one second, freshness and accuracy are important enough to certainInternet search activities (e.g., the identification of companies thatmight represent motivated prospective customers, clients or otherpotential business opportunities) that reasonable search times of on theorder of a few hours to several hours are acceptable and can beachieved, in one embodiment, by requiring an end user-initiated searchrequest to be constrained in some manner so as to allow a structuredsearch query to be generated that is limited to a finite number ofidentifiable domains.

The processing steps described below, represent an example of processingthat may be performed by block 320 of FIG. 3. As such, for purposes ofthe present example, it is assumed an end user-initiated search requesthas been previously received that includes one or more initialconstraints and has been made available to the structured search querygeneration processing (e.g., by way of a function or procedure call,inter process communication (IPC), by way of message passing via ashared memory or one or more queues through which multiple asynchronousprocesses communicate or other mechanism).

At block 410, one or more private data sources are searched for recordsmatching the initial constraint(s) specified by the user-initiatedsearch request. The private data sources may include, but are notlimited to, databases of company information (company databases),potentially a separate database for each language supported by theInternet search engine, proprietary databases (e.g., a subscriber's listof companies or prospects and/or CRM data), and government or compileddata. In one embodiment, a private data source includes a record foreach company including at least the company name, a website domain and anumber of additional fields for storing companyattributes/characteristics. In one embodiment, the one or more initialconstraints may represent characteristics or attributes of companies ofinterest to the end user. Those skilled in the art will appreciatesearching a database based on an attribute on which indexing has beenperformed can be performed more efficiently in terms of time, but bytrading off the additional storage requirements of the data structuresmaintained for the index. As those skilled in the art will alsoappreciate, there are numerous characteristics or attributes ofcompanies that might be used to distinguish among companies. Forexample, the one or more constraints may relate to one or more ofgeography (e.g., a location, specified by one or more of a country,state, city and/or ZIP code, an address or the like), distance from aparticular location (e.g., businesses within 10 to 20 miles from thesubscribing company's address, ZIP code, coordinates (e.g., GPScoordinates, latitude and longitude or the like) or other user-specifiedaddress, ZIP code or coordinates), a type of industry, a number or arange of employees, an annual revenue or range, whether the company ispublic or private (i.e., whether shares of the company are publiclytraded), the company's ownership (e.g., woman owned, veteran owned,minority business enterprise, etc.) and a year in which the company wasfounded or timeframe since the company was founded.

In any event, for purposes of illustration, assume the private datasource being searched is a commercially available company database usedfor marketing and/or lead generation, for example, containinginformation regarding millions of US and/or foreign companies,including, among other information, company names, informationidentifying a type of industry (e.g., North American IndustryClassification System (NAICS) code(s), Standard IndustrialClassification (SIC) code(s) and/or corresponding textual descriptions),SIC2 category, SIC4 category, SIC8 category, state code, city, ZIP code,address, phone number, domain name, market, revenue data (e.g., annualrevenue), coordinates, location type (e.g., branch, headquarters, etc.),market variable, and employment data (e.g., number of employees). Inthis manner, an end user-initiated search query specifying an initialconstraint as “industry type=Computer Software & Hardware,” for example,can be used to query the private data source to identify domain names ofthose companies in the specified industry.

At block 420, the domain names associated with the matching recordsidentified in block 410 are incorporated into the structured searchquery.

As noted above, with reference to FIG. 3, in one embodiment, thestructured search query may initially be defaulted to produce arelatively small sample set so as to give the end user prompt feedbackregarding the nature of their search request, which could beunintentionally over or under inclusive. In this manner, the end usermay be provided with an opportunity to fine-tune their search requestbefore performing a full search of real-time Internet content for allcompanies. In such an embodiment, when a sample search is to beperformed (e.g., as indicated by a global flag or a parametercommunicated to the structured search query generation processing insome other manner), rather than incorporating all of the domain namesassociated with the matching records identified in block 410, the domainnames associated with the first X (where X is a predetermined orconfigurable parameter) matches may be incorporated into the structuredsearch query being built. Alternatively or additionally, the samplesearch may be limited to domain names associated with matching recordswithin a user-specified ZIP code. For example, a combination of a ZIPcode and a predefined numerical limit may be used and the geographicarea can be expanded outwardly until the predefined numerical limit isachieved or until all domain names associated with the matching recordsidentified in block 410 are used.

At block 430, the user-specified search terms or phrases associated withthe search request are also incorporated into the structured searchquery.

FIG. 5 is a flow diagram illustrating real-time Internet contentacquisition processing in accordance with an embodiment of the presentdisclosure. In the context of the present example, it is assumed thereal-time Internet content acquisition processing is performed based ona structured search query specifying multiple domain names and one ormore search terms/phrases. The structured search query may be generated,for example, by the process described above with reference to FIG. 4.

At block 510, information regarding a predetermined or configurabledepth of web pages to be read from each domain and a predetermined orconfigurable timeframe in which the structured search query is desiredto be completed is received. For example, instead of processing everyweb page of each domain, a limited number of web pages of each domainmay be processed in accordance with a breadth first search (or depthfirst search). The desired timeframe for completing structured searchqueries may also be a configurable or predefined system parameter. Forexample, the company providing the Internet search engine may makevarious service level commitments to subscribers in accordance with agold, silver and bronze service plan in which gold-level subscribers areexpected to be provided with search results in under three hours,silver-level subscribers are expected to be provided with search resultsin under four hours and bronze-level subscribers are expected to beprovided with search results in under five hours. Alternatively oradditionally, subscribers may be offered the option of paying additionalfees to accelerate search processing on a search-by-search orsession-by-session basis.

Empirical data evaluated by the assignee of the present inventionsuggests AWS model m5.larges (with 2 virtual CPUs, 8 GB of memory, up to3,500 Mbps of dedicated EBS bandwidth and having network performance ofup to 10 Gbps) are typically capable of processing web pages at a rateof about one web page per 0.6 seconds.

Based on this empirical data and the number of web pages that areexpected to be processed based on the number of domains associated withthe structured search query, the number of virtual servers (or“instances” in Amazon Web Services (AWS) Elastic Compute Cloud (EC2)parlance) required to fulfill the structured search query can bedetermined at block 520 for a given timeframe and the appropriate numberof virtual servers can be dynamically launched to handle the structuredsearch query.

At block 530, the structured search query is distributed among thevirtual servers launched in block 520. In one embodiment, the structuredsearch query is partitioned on a domain-by-domain basis. For example,multiple domain-based structured search queries can be created eachhaving a single one of the domains specified by the original structuredsearch query. As described further below, in one embodiment, messagescan be exchanged between search processing instances and a searchmanager instance via a message queue polled by the search processinginstances, for example. In this manner, as search processing instancescomplete their work for a particular domain-based structured searchquery, they can pull another domain-based structured search query fromthe shared message queue. Alternatively, each virtual server can beassigned a subset of domains from the original structured search queryto process; however, this approach may not make the most efficient useof resources as some domains may end up being processed faster thanothers resulting in some virtual servers being idle as others continueto process their allocated subset of domains.

At block 540, web page retrieval processing is performed. In oneembodiment, each virtual server independently retrieves and scans thedesired depth of web pages for a current website domain it is processingbased on the one or more user-specified search terms/phrases. Anon-limiting example of web page retrieval processing is describedfurther below with reference to FIG. 6.

At block 550, the search results produced by the set of virtual serverslaunched to handle the structured search query are aggregated. In oneembodiment, as each virtual server completes its processing of adomain-based structured search query for a particular domain of the listof domains contained in the original structured search query, it placesits results onto a results queue. A search manager instance may pull thesearch results from the results queue and perform search resultaggregation processing on behalf of a master controller.

As noted above, in one embodiment, because performing a search ofreal-time Internet content can require a significant amount of time, thesearch space, in terms of the number of domains may be intentionallylimited by the Internet search engine and/or by the end user to arelatively small sample set of websites so as to provide the end userwith immediate feedback regarding the types of results that will beproduced by the search request in its current form. In this manner, theend user is provided with an opportunity to revise a potentially overlybroad or overly narrow search request without having to wait severalhours only to find out the search request could have been defined in amore optimal manner. Those skilled in the art will appreciate there arenumerous approaches for implementing this domain limitation. Forexample, each new search request may be initially limited by default andthe end user can be provided with the ability to override this defaultin favor of a full search of those domains meeting his/her initialconstrain(s). Alternatively, each new search request may be initiallydefaulted to a full search, unless a limited sample search isspecifically requested by the end user. Further still, the number ofdomains to be included as part of the limited sample search may be auser-configurable or predefined parameter.

In any event, one option noted above for performing a limited samplesearch representative of the end user's current search request involveslimiting the number of domains included within the structured searchquery. For purposes of generality, it is noted that various othermechanisms are contemplated for limiting the domain search space. Forexample, if a limited sample search is called for, prior to or as partof block 510, the structured search query at issue can be modified so asto limit the number of domain names to the desired sample size.Alternatively, as the domain-based structured search queries aregenerated in block 530, the number of domain-based structured searchqueries can be limited to the desired sample size and the resourcecalculations associated with block 520 can also be informed of thedesired sample size so as to not over allocate resources.

In one embodiment, the real-time Internet content processing may alsoinclude identification and creation of domain-based structured searchqueries for subdomain(s) associated with the domain names expresslyidentified within the structured search query. This may be enabled, forexample, by way of a global flag or a parameter communicated to thereal-time Internet content acquisition processing in some other manner.Alternatively, the structured search query may already includeidentification of subdomains as a result of such information beingincluded in one or more of the private data sources, for example, or asa result of such subdomains being identified and included within thestructured search query during structured search query generationprocessing, for example.

FIG. 6 is a flow diagram illustrating web page retrieval processing inaccordance with an embodiment of the present disclosure. In the contextof the present example, the unit of searching allocated to or performedby a search process or a search server is assumed to be at the level ofa domain name. As discussed further below, in one embodiment, the webpage retrieval processing is performed based on a domain-basedstructured search query pulled from a request queue by a search processor a search server. The domain-based structured search query may begenerated, for example, by the process described above with reference toFIG. 5.

For purposes of determining relevance of a website's content to a set ofsearch terms/phrases, it is not necessary to scan every web pageassociated with a website domain. As such, in one embodiment, aconfigurable or predetermined number of web pages (at times referred toherein as a depth of web pages or search depth) is sampled from eachwebsite domain being evaluated. In one embodiment, the default searchdepth is between approximately 10 and 100 web pages. The low-end of thisrange does not mean it is the minimum number of web pages required toreturn satisfactory results and the high-end of this range does not meanit is a maximum number of web pages. Those skilled in the art willappreciate the predetermined or configurable search depth simplyrepresents a tradeoff among various factors, including, but not limitedto, the speed of search query processing and the amount of virtualand/or physical resources (e.g., memory resources and processingresources) required. Empirical data suggests a search depth of 20 webpages produces satisfactory results in the majority of situations. Thoseskilled in the art will appreciate the search depth can be adjusted upor down for particular usage scenarios by an administrator of theInternet search engine or can be a parameter capable of being configuredby end users (e.g., on a session-by-session or a search-by-search basis)or a subscriber account administrator (e.g., on a session-by-session ora subscriber-by-subscriber basis).

At block 610, starting with the home page, content associated with awebsite domain is retrieved and stored by traversing the website domainin accordance with a Breadth First Search. As web pages are retrieved,they are parsed to identify internal links (i.e., links to web pageswithin the website domain at issue). The retrieval and traversalcontinues until the desired web page depth has been reached or nofurther internal links are found. Those skilled in the art understandthat Breadth First Search is one of many possible tree traversalalgorithms. In alternative embodiments, one or more other tree traversalalgorithms (e.g., Depth First Search, Fish Search, A* Search or AdaptiveA* Search) or modifications thereof may be used alone or in variouscombinations.

At decision block 620, the retrieved web pages are scanned and evaluatedto determine whether they satisfy match criteria associated withuser-specified search terms or phrases. In one embodiment, theuser-specified search terms or phrases include one or more of mandatorysearch terms or phrases, ideal search terms or phrases and exclusionarysearch terms or phrases. Mandatory search terms or phrases are thoseterms or phrases that must be present within the content of at least oneweb page of the website domain for the match criteria to be consideredsatisfied. Ideal search terms or phrases are those terms or phrases thatthe end user would reasonably expect to be present within the content ofone or more web pages of a website domain, but they may or may not bepresent as companies may use different terminology than specificallycalled out by the end user. Exclusionary search terms or phrases arethose terms or phrases that must not be present within the content ofany web page of the website domain for the match criteria to beconsidered satisfied. When the match criteria are satisfied, processingcontinues with block 630; otherwise, processing continues with block640.

At block 630, the retrieved web pages from the web domain at issue areincluded as part of a collection of search results and returned to theupstream process or upstream server that requested the web pageretrieval processing. The search results may also include informationidentifying the domain or company with which the search results areassociated as well as other statistics associated with the websitedomain and/or web page content.

At block 640, the retrieved web pages are discarded as the contentassociated therewith has either been identified as specifically not ofinterest to the end user or did not meet the requirements for beingdeemed relevant to the end user.

While in the context of the simplified example presented in FIG. 6, allweb pages expected to be processed for a given domain are shown as beingretrieved at block 610 and scanned at decision block 620, in oneembodiment, when one or more exclusionary conditions (e.g., one or moreexclusionary terms or phrases) are part of the search request, forexample, the web page retrieval processing can be performed moreefficiently by applying the exclusionary conditions as individual webpages are retrieved. In this manner, web page retrieval processing canbe stopped immediately upon determining the content associated with thedomain includes an exclusionary term or phrase.

In one embodiment, the web page retrieval processing also includescounting occurrences terms/phrases appearing on those of the web pagesof a web site domain that are evaluated. As such, the results returnedcan also include various statistics about the content of the website.For example, a configurable or predetermined number of most frequentlyappearing terms/phrases can be reported as part of the search resultsfor a particular website domain. As discussed further below, all or asubset of these most frequently appearing terms/phrases can be presentedto the end user to facilitate further refinement of his/her searchrequest, performance of local filtering of the search results and/orperformance of local ranking of the search results by aggregating scoresassigned by the end user to particular terms/phrases and sorting theorder in which companies are presented to the end user based on theaggregated scores as described further below.

In one embodiment, the web page retrieval processing and/or subsequentlocal post-processing of such web content may include evaluatingtechnologies associated with the website domain. The use of certain webtechnologies by companies as reflected by the source code of their webcontent may make such companies more relevant to certain types ofsubscribers.

While in the context of FIGS. 5 and 6, searching is said to be“allocated” on the basis of a domain, those skilled in the art willappreciate website domains can be pre-parsed in accordance with thedesired website traversal algorithm (e.g., Breadth First Search, DepthFirst Search, Fish Search, A* Search or Adaptive A* Search) and insteadof queuing domain-based structured search queries for downstreamprocessing, web page-based structured search queries, representingindividually identified web pages, can be queued for downstreamprocessing.

FIG. 7 illustrates an exemplary architecture 700 of an Internet searchengine in accordance with an embodiment of the present invention inwhich at least some portion of search processing functionality (e.g.,the processing associated with one or more of FIG. 4, FIG. 5 and FIG. 6)is implemented within a cloud service. As noted above, a non-limitingexample of a cloud service is Amazon Web Services (AWS) with ElasticCompute Cloud (EC2), which provides scalable virtual servers (calledinstances) that facilitate resizable computing capacity capable ofsupporting a multi-tenant Internet search service and allowing theInternet search service to pay only for capacity that is actually used.

In the context of the present example, architecture 700 includes amaster controller 710, multiple managers 720 a-x, multiple workers 730a-y, a request queue 721 and a results queue 722. According to oneembodiment, Internet search engine may make use of dynamic resizablecomputing capacity provided by the cloud service provider to dynamicallylaunch by manager 720 a a desired number of instances (represented inFIG. 7 as Workers 730 a-y).

In the context of the present example, search requests initiated by endusers of subscribers (e.g., received via a graphical user interface,such as that described further below) are assumed to be received bymaster controller 710 and distributed to dynamically instantiatedmanagers 720 a-x. As noted above, in one embodiment, search requests aretransformed into structured search queries specifying one or moredomains. In the current example, assuming a structured search queryspecifying multiple domains has been assigned to manager 720 a by mastercontroller 710, the number of workers 730 a-y to be instantiated bymanager 720 a can be determined mathematically as described furtherbelow.

Each launched worker instance (e.g., workers 730 a-y) retrieves domaininformation from the manager's queue (e.g., request queue 721) to readthrough and process the appropriate domain. A non-limiting example of aqueue service that may be used to create request queue 721 and resultsqueue 722 is Amazon Simple Queue Service (SQS), which providesasynchronous messaging to allow various application components (in thiscase, manager 720 a and workers 730 a-y) to communicate in the cloud.The information gained from reading through the domain is stored on theworker instance (or passed back to manager 720 a via results queue 722),and the worker will retrieve new domain information from the manager'squeue. This process continues until there are no more domains to receivefrom the manager's queue. At this point, the workers can upload theirsearch data to files on Amazon's S3. These files can be later downloadedby master controller 710 and entered into a search result database(e.g., search results database 231) of Internet search engine.

In one embodiment, the structured search query can be transformed intomultiple domain-based search queries (e.g., one domain-based searchquery for each of the multiple domains specified by the structuredsearch query). The multiple domain-based search queries can beindividually queued on request queue 721 of manager 720 a from whicheach of workers 730 a-y retrieve a domain-based search query when theyare available to perform searching, thereby allowing the searchprocessing to be distributed across workers 730 a-y. When workers 730a-y have completed processing of a domain-based search query on the website domain at issue, they may communicate their search results tomanager 720 a by placing the search results on results queue 722 ofmanager 720 a.

Returning now to how a manager (e.g., manager 720 a) can determine thenumber of workers 730 a-y to be instantiated, in one embodiment,information regarding the time required to complete previous searches ismaintained to facilitate the making of predictions regarding futuresearch durations. As such, in one embodiment, the appropriate number ofworkers 730 a-y can be allocated to finish the structured search querywithin a predetermined time constraint by utilizing the knowledgeregarding the timing of previous searches. For instance, for astructured search query specifying 100,000 domains and an average timeper domain of 0.1 seconds, it would take 10,000 seconds to complete, orapproximately 2 hours and 45 minutes. This average time per domainaccounts for 10 instances (e.g., workers 730 a-y) each assumed to berunning 8 processes each for a total of 80 processes. These 80 processescan read through 10 domains per second or 1 domain every 0.1 seconds.Each individual process can therefore read through ⅛ of a domain persecond. Utilizing this information, the appropriate number of instancescan be allocated to complete the structured search query in thepredetermined time constraint. For example, based on the average timeper domain noted above, completing a structured search query specifying100,000 domains in one hour would require 222 processes, completingthose domains in two hours would require 111 processes, and completingthose domains in three hours would require 74 processes. Since eachworker utilizes 8 processes in this example, a one hour search requires28 workers, a two hour search requires 14 workers, and a three hoursearch requires 10 workers. The number of processes per worker is chosento utilize a majority of the currently selected (e.g., m5.large)instance's CPU without overloading it and could be higher or lowerdepending on which instance type is used and that instance's hardwarespecifications. As such, those skilled in the art will appreciate theabove scenarios are provided for sake of completeness, but are notintended to be limiting.

Use Case/Example:

As noted above, as a result of limiting the scope of Internet contentthat is to be searched to a specific set of domain names, for example,identified by applying one or more user-specified initial searchconstraints to one or more private data sources, completion of a searchof real-time Internet content becomes achievable within a reasonableamount of time and by extension numerous usage models of real-timeInternet content also become feasible. While one specific use caserelating to customer prospecting is described below with reference toexemplary user interface screen shots in order to illustrate aparticular practical application and implementation relating tosearching real-time Internet content, this use case is not intended tobe limiting as the assignee contemplates numerous other searchapplications and objectives. For example, an end user or subscriber ofthe Internet search engine described herein may wish to identifycompanies or products/services offered by such companies for purposesother than or in addition to customer prospecting, including, but notlimited to, for the purpose of performing industry, product or serviceresearch, government administration, identifying potential merger and/oracquisition opportunities, licensing, partnering, joint development,joint research, collaboration, and/or otherwise obtaining actionablebusiness intelligence from real-time Internet content.

FIG. 8A illustrates a portion of an Ideal Customer Profile (ICP) screen800 of a user interface of a SaaS based Internet search engine inaccordance with an embodiment of the present invention. In the contextof the present example, ICP screen 800 guides an end user of asubscriber of the Internet search service through the process ofcreating an ICP. In one embodiment, an ICP is intended to describecompanies (in terms of various characteristics/attributes and searchterms/phrases) that have a problem (e.g., a need for a products/service)that is solved by the subscriber. In this manner, when an ICP isproperly specified and/or refined, use of the ICP, which represents anexample of a search request as described above with reference to FIG. 3,to generate a constrained search query with reference to one or moreprivate data sources produces fresh data regarding what should be highlymotivated prospects when applied to real-time Internet content.

ICP screen 800 includes a first user interface element 801 (e.g., a textentry field, a dropdown list, a list box, or the like) for specifyingone or more initial constraints, a second user interface element 802(e.g., a text entry field) for entering zero or more “nice to haveterms,” a third user interface element 802 (e.g., a text entry field)for inputting zero or more “must have terms,” a forth user interfaceelement 804 (e.g., a text entry field) for submitting zero or more “mustnot have terms,” and a link 805 to a fifth user interface element (notshown) in the form of, for example, a text entry field, checkboxes,radio buttons, a dropdown list, a list box, or the like, for identifyingone or more technologies desired to be reflected in the source code of aprospect's web site.

In one embodiment, the first user interface element 801 allows thesubscriber to manually enter or select from a predefined list one ormore industries or industry verticals.

For purposes of illustration, the subscriber may be a business thatmakes residential sprinkler system controls that conserve water. So, thesubscriber may be looking for residential sprinkler installers to resellthe subscriber's equipment. Given there are over 300,000 sprinklerinstallers in the United States, it is not time or cost effective forthe subscriber to attempt to contact all of them by phone or directmail, for example. As such, it would be helpful for the subscriber tonarrow down the list of sprinkler installers.

In the present example, the subscriber knows their best installers willbe those companies in the irrigation industry who are already passionateabout and have built a business around water conservation. As such, thesubscriber begins building the ICP by selecting or inputting the“irrigation” industry with the first user interface element 801. Adiscussion regarding creation of search terms and phrases by thesubscriber continues with reference to FIGS. 8B-D.

FIG. 8B illustrates a portion of the Ideal Customer Profile (ICP) screen800 after the end user has specified a mandatory term/phrase inaccordance with an embodiment of the present invention. Continuing withthe example subscriber seeking to identify appropriate residentialsprinkler installers, the subscriber now uses third user interfaceelement 803 to create one or more “must have term” (also, referred toherein as a mandatory search terms/phrases). In the context of thisexample, a necessary, but not sufficient condition, for a company to beconsidered a match for the ICP is all specified mandatory search termsor phrases must be present within the content of at least one web pageof the company's website domain.

In view of the definition of a mandatory search term/phrase, subscriberswill need to take care not to include too many mandatory searchterms/phrases as this can severely limit the number of search resultsreturned. It is often best for a subscriber to start more broadly andthen iteratively refine the ICP based on feedback from running one ormore quick sample searches (e.g., limited in geography to a particularZIP code, for example, or limited in scope to a relatively small numberof domain names) before applying the ICP to real-time Internet contentin a less constrained manner. As such, in the context of the presentexample, the subscriber starts by including only a single mandatoryterm, i.e., “sprinkler,” for third user interface element 803.

FIG. 8C illustrates a portion of the Ideal Customer Profile (ICP) screen800 after the end user has specified multiple ideal terms/phrases inaccordance with an embodiment of the present invention. Continuing withthe example subscriber seeking appropriate residential sprinklerinstallers, the subscriber now uses second user interface element 802 tocreate one or more “nice to have terms” (also, referred to herein as anideal search terms/phrases). In the context of this example, idealsearch terms or phrases are those terms or phrases that the subscriberwould reasonably expect to be present within the content of one or moreweb pages of a website domain of a prospect, but they are not requiredto be present within a company's website domain for the company to beconsidered a match for the ICP.

In connection with specifying ideal search terms/phrases, one challengethe subscriber will need to consider is the businesses they seek toidentify may talk about themselves in different terms, other than or inaddition to water conservation in their website content. For example,terms/phrases that might reasonably be expected to appear within websitedomains of companies in the irrigation industry that are interested inwater conservation include “water savings,” “drought resistance,” “waterefficiency,” “consumption reduction,” “lowering water bills” and thelike.

Since there is little risk in over-inclusiveness in relation to idealterms/phrases, in this example, the subscriber has specified all of theabove-listed phrases as ideal search terms/phrases within second userinterface element 802.

FIG. 8D illustrates a portion of the Ideal Customer Profile (ICP) screen800 after the end user has specified multiple exclusionary terms/phrasesin accordance with an embodiment of the present invention. Continuingwith the example subscriber that makes residential sprinkler systemcontrols for conserving water, the subscriber now uses forth userinterface element 804 to specify one or more “must not have term” (also,referred to herein as exclusionary search terms/phrases). Exclusionarysearch terms or phrases are those terms or phrases that must not bepresent within the content of any web page of a company's website domainfor the company to be considered a match for the ICP.

Since the subscriber only deals with residential irrigation companies,in this example, the subscriber has decided to exclude the terms“office,” “apartment,” “commercial,” and “farm” by specifying theseterms with the forth user interface element 804. When the subscriber isdone specifying one or more initial constraints, e.g., industry types orindustry verticals, and entering desired search terms/phrases, e.g., oneor more ideal terms/phrases, mandatory terms/phrases and/or exclusionaryterms/phrases, the subscriber can submit the ICP to initiate performanceof a search of real-time Internet content. As noted above, the searchmay be performed as a sample search (in this case, a sample ICP search,which can typically be completed within minutes) or a full ICP search,which can take on the order of a couple hours to several hours,depending upon the number of companies in the one or more private datasources that satisfy the initial search constraint(s). In oneembodiment, the subscriber can provide location information (e.g.,geolocation information, address, or a ZIP code) to constrain the searchfor purposes of receiving relatively quick feedback on how the searchwill perform before incurring the longer timeframe for receiving resultsof a full ICP search. Depending upon the particular implementation, whena sample ICP search is requested, the Internet search engine may definea predetermined or configurable radius based on the specified locationinformation and constrain the resulting structured search querygenerated based on the initial constraint(s) specified by the ICP andone or more private data sources. For example, the initial constraint(s)may initially be used to identify companies within the one or moreprivate data sources meeting the initial constraints and further limitthe identified companies to those located within the predetermined orconfigurable radius. Then, as described above, a structured search querymay be generated for the sample ICP search based on a list of domainnames corresponding to the limited set of identified companies whosecontent is to be searched for the user-specified search terms/phrases.Alternatively, if a full ICP search is to be performed, the structuredsearch query is generated based on a list of domain names correspondingto the companies identified based on application of the initial searchconstraint(s) to the one or more private data sources.

FIG. 8E illustrates a portion of prospect list screen 810 of a userinterface of a SaaS based Internet search engine in accordance with anembodiment of the present invention. Prospect list screen 810 is shownincluding multiple tabs 811 a-d. In the present example, the currentlyactive tab is prospects tab 811 a. The subscriber can easily switchbetween various screens of the user interface by selecting one of theother tabs 811 b-d. By selecting filter tab 811 b, the subscriber ispresented with a search result filtering screen, an example of which isdescribed below with reference to FIG. 8F. By selecting rank tab 811 c,a search result ranking screen is displayed to the subscriber, examplesof which are described below with reference to FIGS. 8G and 8H. Byselecting save tab 811 d, the subscriber may be presented with a savesearch results screen where the subscriber can save the current set ofsearch results for future reference and/or export the search results toone or more file formats (e.g., comma separated values (CSV), PortableDocument Format (PDF), or the like).

In the context of the present example, it is assumed the ICP specifiedwith reference to FIGS. 8A-D has been used to generate a structuredsearch query that has been used to search real-time Internet content ofwebsite domains associated with companies in one or more private datasources (e.g., a company database) meeting the initial searchconstraint(s). In one embodiment, the companies identified as having webcontent satisfying match criteria for the search terms/phrases by theInternet search engine are presented to the subscriber on prospect listscreen 810 in the form of a list of company records, including companyrecords 812 a-b.

Prospect list screen 810 also includes an area 813 providing informationregarding ICP details, an area 814 providing information regardingprospect list filters and an area 815 providing information regardingprospect list ranking, Area 813 identifies the one or more initialsearch constraints (e.g., the selected industries and/or industryverticals) and the user-specified search terms/phrases (e.g., themandatory terms/phrases, the ideal terms/phrases and the exclusionaryterms/phrases) that were employed to perform the search that producedthe search results presented (e.g., the list of company records). To theextent the subscriber has already specified one or more filteringcriteria, area 814 identifies the filtering criteria that have beenlocally applied to the search results to produce the list of companyrecords. To the extent the subscriber has already specified one or moreranking criteria, area 815 identifies the ranking criteria that havebeen locally applied to the search results to produce the ordering ofthe list of company records.

As often companies will be miscategorized, intentionally orunintentionally, after reviewing the search results, the subscriber canperform further refinements to winnow the list of company records tothose deemed most relevant. For example, in the context of the examplesubscriber searching for appropriate residential sprinkler installers, atree trimming company may be included among the search results due tothe company's website including a discussion about how using sprinklerstoo much or too little can damage trees. So, despite the fact that thetree trimming company included the mandatory term/phrase “sprinkler,”the tree trimming company is not a suitable prospect for the subscriber.In such a scenario, the subscriber can either go back and refine themain ICP via ICP screen 800 and rerun the search or can use a quickerpath by specifying filtering criteria. In one embodiment, the filteringcriteria may include the ability to provide exclusionary filtering, forexample, under a sub-industries heading. So, the subscriber couldexclude the tree trimming company from showing up in future searchresults by selecting “tree trimming” and “tree maintenance” as excludedsub-industries, thereby eliminating such companies from being includedin future search results associated with this ICP.

FIG. 8F illustrates a portion of search result filtering screen 820 of auser interface of a SaaS based Internet search engine in accordance withan embodiment of the present invention. The list of company recordsprovided responsive a search conducted based on the ICP may be quiteextensive. In order to identify those most relevant to the needs of thesubscriber, search result filtering screen 820 provides several ways forthe subscriber to further refine the characteristics/attributes ofcompanies to be included in a filtered list of prospects.

In the context of the present example, search result filtering screen820 includes five sets of radio buttons 821, 822, 823, 824 and 825.Radio buttons 821 allow the subscriber to select a condition relating toa specified number of years or range of years the prospect should be inbusiness to be included in the filtered list of prospects. For example,the subscriber may specify a number or range of years and select one ofthe following conditions: “more than,” “less than,” or “between.”Alternatively, the subscriber can “clear” this filter.

Similarly, radio buttons 822 allow the subscriber to select a conditionrelating to a specified number of employees or range of employees theprospect should employ to be included in the filtered list of prospects.For example, the subscriber may specify a number or range of employeesand select one of the following conditions: “more than,” “less than,” or“between.” Alternatively, the subscriber can “clear” this filter.

Radio buttons 223 allow the subscriber to select a condition relating toa specified number of locations the ideal prospect should have to beincluded in the filtered list of prospects. For example, the subscribermay specify a number or range of locations and select one of thefollowing conditions: “more than,” “less than,” or “between.”Alternatively, the subscriber can “clear” this filter.

Radio buttons 224 allow the subscriber to select whether onlyheadquarters locations for companies are to be included in the filteredlist of prospects.

Radio buttons 225 allow the subscriber to select conditions relating tothe public or private nature of the companies that are to be included inthe filtered list of prospects. For example, the subscriber may specifyonly publicly traded companies are be included in the filtered list ofprospects, only privately owned companies are to be included in thefiltered list or prospects, both publicly traded companies and privatelyowned companies are to be included in the filtered list of prospects or“any” type of company, which would presumably include government ownedcompanies, may be included within the filtered list of prospects.

Once the subscriber has specified all the filtering criteria that theydesire, the “Apply Filters” button 826 may be selected to locally filterthe search results based on the specified filtering criteria. After thesubscriber has selected button 826, prospect list screen 810 may bepresented to the subscriber with the filtered list of prospects.

FIG. 8G illustrates a portion of search result ranking screen 830 of auser interface of a SaaS based Internet search engine in accordance withan embodiment of the present invention. As noted above, the list ofcompany records provided responsive a search conducted based on the ICPmay be quite extensive, even after user-specified filtering criteriahave been applied. In an effort to sort/rank the list of company recordsin accordance with the priorities of the subscriber, search resultranking screen 830 may provide mechanisms for the subscriber to furtherspecify the relative importance of various characteristics/attributes ofcompanies and/or the search terms/phrases. In the context of the presentexample, search result ranking screen 830 includes a set of radiobuttons for each of multiple technologies 831 (e.g., LiveChat, RayChat,Bold Chat, HubSpot, Salesforce, Marketo, and Zendesk Chat) used bycompanies that might be evidenced by the source code of their websitecontent. Those skilled in the art will appreciate the specifictechnologies listed on search result ranking screen 830 are merelyexemplary in nature as there are over one thousand web technologies inover sixty categories (e.g., accounting, advertising networks, analyticstools, blogs, cache tools, captchas, content management systems,databases, dev tools, ecommerce platforms, network devices, networkstorage, operating systems, payment processors, programming languages,server software, video players, web frameworks, web mail, widgets,etc.). Depending upon the need so a particular implementation, varioussubsets or all of such web technologies might be capable of individualranking in accordance with embodiments of the present invention.

For each “nice to have technology,” the subscriber can select a relativeimportance from one to three. A ranking score can then be calculated foreach company in the list of company records by adding one to threepoints to their score for each of the technologies found to be used bythe company as a result of analyzing the source code of their websitecontent with a web technology profiler tool, for example.

The ranking score can also be influenced as a result of repetition ofthe user-specified search terms/phrases via radio buttons 832. In oneembodiment, as described with reference to FIG. 8H, the subscriber maybe provided with the ability to individually rate the importance ofideal terms and phrases.

Once the subscriber has specified all the ranking criteria that theydesire, the “Apply Ranking” button 833 may be selected to locally rankthe search results based on the specified ranking criteria andcorresponding scoring of the web site content of the companies in thelist of company records. After the subscriber has selected button 826,prospect list screen 810 may be presented to the subscriber with theranked list of prospects.

FIG. 8H illustrates a portion of search result ranking screen 830 of auser interface of a SaaS based Internet search engine in accordance withan alternative embodiment of the present invention. In this example,individual ideal terms/phrases 834 a-c can be assigned scores inaccordance with their perceived importance to the subscriber. For eachappearance of such ideal terms/phrases in web content of a company, thecompany's ranking score can be increased accordingly.

According to one embodiment, the techniques described herein areimplemented by one or more special-purpose computing devices. Thespecial-purpose computing devices may be hard-wired to perform thetechniques, or may include digital electronic devices such as one ormore application-specific integrated circuits (ASICs) or fieldprogrammable gate arrays (FPGAs) that are persistently programmed toperform the techniques, or may include one or more general purposehardware processors programmed to perform the techniques pursuant toprogram instructions in firmware, memory, other storage, or acombination. Such special-purpose computing devices may also combinecustom hard-wired logic, ASICs, or FPGAs with custom programming toaccomplish the techniques. The special-purpose computing devices may bedesktop computer systems, portable computer systems, handheld devices,networking devices or any other device that incorporates hard-wired orprogram logic to implement the techniques.

FIG. 9 is a block diagram that illustrates a computer system 900 uponwhich an embodiment of the invention may be implemented. Computer system900 may be representative of all or a portion of the computing resourcesassociated with a web server (e.g., web server(s) 210), a search server(e.g., search server(s) 215), an application server (e.g., applicationserver(s) 220), or end user work stations (e.g., computers 112 a-n).Notably, components of computer system 900 described herein are meantonly to exemplify various possibilities. In no way should exemplarycomputer system 900 limit the scope of the present invention. In thecontext of the present example, computer system 900 includes a bus 902or other communication mechanism for communicating information, and ahardware processor 904 coupled with bus 902 for processing information.Hardware processor 904 may be, for example, a general purposemicroprocessor.

Computer system 900 also includes a main memory 906, such as a randomaccess memory (RAM) or other dynamic storage device, coupled to bus 902for storing information and instructions to be executed by processor904. Main memory 906 also may be used for storing temporary variables orother intermediate information during execution of instructions to beexecuted by processor 904. Such instructions, when stored innon-transitory storage media accessible to processor 904, rendercomputer system 900 into a special-purpose machine that is customized toperform the operations specified in the instructions.

Computer system 900 further includes a read only memory (ROM) 908 orother static storage device coupled to bus 902 for storing staticinformation and instructions for processor 904. A storage device 910,e.g., a magnetic disk, optical disk or flash disk (made of flash memorychips), is provided and coupled to bus 902 for storing information andinstructions.

Computer system 900 may be coupled via bus 902 to a display 912, e.g., acathode ray tube (CRT), Liquid Crystal Display (LCD), OrganicLight-Emitting Diode Display (OLED), Digital Light Processing Display(DLP) or the like, for displaying information to a computer user. Aninput device 914, including alphanumeric and other keys, is coupled tobus 902 for communicating information and command selections toprocessor 904. Another type of user input device is cursor control 916,such as a mouse, a trackball, a trackpad, or cursor direction keys forcommunicating direction information and command selections to processor904 and for controlling cursor movement on display 912. This inputdevice typically has two degrees of freedom in two axes, a first axis(e.g., x) and a second axis (e.g., y), that allows the device to specifypositions in a plane.

Removable storage media 940 can be any kind of external storage media,including, but not limited to, hard-drives, floppy drives, IOMEGA® ZipDrives, Compact Disc-Read Only Memory (CD-ROM), Compact Disc-Re-Writable(CD-RW), Digital Video Disk-Read Only Memory (DVD-ROM), USB flash drivesand the like.

Computer system 900 may implement the techniques described herein usingcustomized hard-wired logic, one or more ASICs or FPGAs, firmware orprogram logic which in combination with the computer system causes orprograms computer system 900 to be a special-purpose machine. Accordingto one embodiment, the techniques herein are performed by computersystem 900 in response to processor 904 executing one or more sequencesof one or more instructions contained in main memory 906. Suchinstructions may be read into main memory 906 from another storagemedium, such as storage device 910. Execution of the sequences ofinstructions contained in main memory 906 causes processor 904 toperform the process steps described herein. In alternative embodiments,hard-wired circuitry may be used in place of or in combination withsoftware instructions.

The term “storage media” as used herein refers to any non-transitorymedia that store data or instructions that cause a machine to operationin a specific fashion. Such storage media may comprise non-volatilemedia or volatile media. Non-volatile media includes, for example,optical, magnetic or flash disks, such as storage device 910. Volatilemedia includes dynamic memory, such as main memory 906. Common forms ofstorage media include, for example, a flexible disk, a hard disk, asolid state drive, a magnetic tape, or any other magnetic data storagemedium, a CD-ROM, any other optical data storage medium, any physicalmedium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM,NVRAM, any other memory chip or cartridge.

Storage media is distinct from but may be used in conjunction withtransmission media. Transmission media participates in transferringinformation between storage media. For example, transmission mediaincludes coaxial cables, copper wire and fiber optics, including thewires that comprise bus 902. Transmission media can also take the formof acoustic or light waves, such as those generated during radio-waveand infra-red data communications.

Various forms of media may be involved in carrying one or more sequencesof one or more instructions to processor 904 for execution. For example,the instructions may initially be carried on a magnetic disk or solidstate drive of a remote computer. The remote computer can load theinstructions into its dynamic memory and send the instructions over atelephone line using a modem. A modem local to computer system 900 canreceive the data on the telephone line and use an infra-red transmitterto convert the data to an infra-red signal. An infra-red detector canreceive the data carried in the infra-red signal and appropriatecircuitry can place the data on bus 902. Bus 902 carries the data tomain memory 906, from which processor 904 retrieves and executes theinstructions. The instructions received by main memory 906 mayoptionally be stored on storage device 910 either before or afterexecution by processor 904.

Computer system 900 also includes a communication interface 918 coupledto bus 902. Communication interface 918 provides a two-way datacommunication coupling to a network link 920 that is connected to alocal network 922. For example, communication interface 918 may be anintegrated services digital network (ISDN) card, cable modem, satellitemodem, or a modem to provide a data communication connection to acorresponding type of telephone line. As another example, communicationinterface 918 may be a local area network (LAN) card to provide a datacommunication connection to a compatible LAN. Wireless links may also beimplemented. In any such implementation, communication interface 918sends and receives electrical, electromagnetic or optical signals thatcarry digital data streams representing various types of information.

Network link 920 typically provides data communication through one ormore networks to other data devices. For example, network link 920 mayprovide a connection through local network 922 to a host computer 924 orto data equipment operated by an Internet Service Provider (ISP) 926.ISP 926 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the“Internet” 928. Local network 922 and Internet 928 both use electrical,electromagnetic or optical signals that carry digital data streams. Thesignals through the various networks and the signals on network link 920and through communication interface 918, which carry the digital data toand from computer system 900, are example forms of transmission media.

Computer system 900 can send messages and receive data, includingprogram code, through the network(s), network link 920 and communicationinterface 918. In the Internet example, a server 930 might transmit arequested code for an application program through Internet 928, ISP 926,local network 922 and communication interface 918. The received code maybe executed by processor 904 as it is received, or stored in storagedevice 910, or other non-volatile storage for later execution.

While embodiments of the present invention have been illustrated anddescribed, it will be clear that the invention is not limited to thesespecific embodiments. Numerous modifications, changes, variations,substitutions, and equivalents will be apparent to those skilled in theart, without departing from the spirit and scope of the invention, asdescribed in the claims.

What is claimed is:
 1. A method performed by one or more processingresources of a Software as a Service (SaaS) based Internet searchengine, the method comprising: receiving a search request from an enduser of a subscriber of the SaaS based Internet search engine, whereinthe search request includes (i) one or more initial constraintsspecified by the end user that are to be used by the SaaS based Internetsearch engine to constrain a search of real-time Internet content to beperformed based on the search request to a domain name space and (ii)one or more user-specified search terms or phrases; and responsive toreceipt of the search request, providing the end user with feedbackregarding a subset of potential outcomes of the search request bygenerating a structured search query containing a finite number ofdomain names defining the domain name space and one or more search termsor phrases to be applied to the real-time Internet content within thedomain name space by: identifying company records matching the one ormore initial constraints and satisfying one or more sampling criteria bysearching a company database having stored thereincharacteristics/attributes regarding a plurality of companies; andincorporating into the structured search query: a plurality of domainnames each representing a web site domain of a company of those of theplurality of companies associated with the identified matching recordsby extracting the plurality of domain names from the identified matchingcompany records; and the one or more user-specified search terms orphrases; and causing the subset of potential outcomes to be displayed tothe end user by performing a limited search of the real-time Internetcontent based on the structured search query.
 2. The method of claim 1,further comprising: responsive to receipt of confirmation from the enduser to proceed with a full search based on the search request,regenerating the structured search query without limiting saididentifying based on the one or more sampling criteria; and performingthe search of the real-time Internet content based on the structuredsearch query.
 3. The method of claim 1, wherein the one or more samplingcriteria comprises a limit on a number of the domain names.
 4. Themethod of claim 1, wherein the one or more sampling criteria comprises ageographic area.
 5. The method of claim 4, wherein the geographic areais defined by one or more postal codes.
 6. The method of claim 4,wherein the geographic area is defined by a circle having a predefinedor configurable radius and a center at a particular location.
 7. Themethod of claim 6, wherein the particular location comprises an address,a postal code, or a set of coordinates.
 8. The method of claim 1,wherein a first match criterion of the match criteria associated withthe one or more user-specified search terms or phrases identifies afirst set of the one or more user-specified search terms or phrases asmandatory terms or phrases that must be present in the real-timeInternet content for the match criteria to be satisfied.
 9. The methodof claim 8, wherein a second match criterion of the match criteriaassociated with the one or more user-specified search terms or phrasesidentifies a second set of the one or more user-specified search termsor phrases as ideal terms or phrases that are desired to be present butneed not be present in the real-time Internet content for the matchcriterial to be satisfied.
 10. The method of claim 9, wherein a thirdmatch criterion of the match criteria associated with the one or moreuser-specified search terms or phrases identifies a third set of the oneor more user-specified search terms or phrases as exclusionary terms orphrases that must not be present in the real-time Internet content forthe match criteria to be satisfied.
 11. A non-transitorycomputer-readable storage medium embodying a set of instructions, whichwhen executed by one or more processing resources associated with aSoftware as a Service (SaaS) based Internet search engine, causes theone or more processing resources to perform a method comprising:receiving a search request from an end user of a subscriber of the SaaSbased Internet search engine, wherein the search request includes (i)one or more initial constraints specified by the end user that are to beused by the SaaS based Internet search engine to constrain a search ofreal-time Internet content to be performed based on the search requestto a domain name space and (ii) one or more user-specified search termsor phrases; and responsive to receipt of the search request, providingthe end user with feedback regarding a subset of potential outcomes ofthe search request by generating a structured search query containing afinite number of domain names defining the domain name space and one ormore search terms or phrases to be applied to the real-time Internetcontent within the domain name space by: identifying company recordsmatching the one or more initial constraints and satisfying one or moresampling criteria by searching a company database having stored thereincharacteristics/attributes regarding a plurality of companies; andincorporating into the structured search query: a plurality of domainnames each representing a web site domain of a company of those of theplurality of companies associated with the identified matching recordsby extracting the plurality of domain names from the identified matchingcompany records; and the one or more user-specified search terms orphrases; and causing the subset of potential outcomes to be displayed tothe end user by performing a limited search of the real-time Internetcontent based on the structured search query.
 12. The non-transitorycomputer-readable storage medium of claim 11, wherein the method furthercomprises: responsive to receipt of confirmation from the end user toproceed with a full search based on the search request, regenerating thestructured search query without limiting said identifying based on theone or more sampling criteria; and performing the search of thereal-time Internet content based on the structured search query.
 13. Thenon-transitory computer-readable storage medium of claim 11, wherein theone or more sampling criteria comprises a limit on a number of thedomain names.
 14. The non-transitory computer-readable storage medium ofclaim 11, wherein the one or more sampling criteria comprises ageographic area.
 15. The non-transitory computer-readable storage mediumof claim 14, wherein the geographic area is defined by one or morepostal codes.
 16. The non-transitory computer-readable storage medium ofclaim 14, wherein the geographic area is defined by a circle having apredefined or configurable radius and a center at a particular location.17. The non-transitory computer-readable storage medium of claim 16,wherein the particular location comprises an address, a postal code, ora set of coordinates.
 18. The non-transitory computer-readable storagemedium of claim 11, wherein a first match criterion of the matchcriteria associated with the one or more user-specified search terms orphrases identifies a first set of the one or more user-specified searchterms or phrases as mandatory terms or phrases that must be present inthe real-time Internet content for the match criteria to be satisfied.19. The method of claim 18, wherein a second match criterion of thematch criteria associated with the one or more user-specified searchterms or phrases identifies a second set of the one or moreuser-specified search terms or phrases as ideal terms or phrases thatare desired to be present but need not be present in the real-timeInternet content for the match criterial to be satisfied.
 20. Thenon-transitory computer-readable storage medium of claim 19, wherein athird match criterion of the match criteria associated with the one ormore user-specified search terms or phrases identifies a third set ofthe one or more user-specified search terms or phrases as exclusionaryterms or phrases that must not be present in the real-time Internetcontent for the match criteria to be satisfied.